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(54) Fonmat conversion of graphical Image data words 



(57) Data components in a first, processing fbrnr^ 
each of wttich includes a selected portion which repre- 
sents the data component in a second, display format 
are merged to fomn an interleaved data word in which 
the selected portior^ of data components are grouped. 
For example, two pixel oonponents, which are repre- 
sented in a two-byte format in which the least significant 
byte represents each pixel component in a one-byte for- 
mat, are merged to form a four-byte interleaved wad in 
which the first two bytes are the most significant bytes of 
the pixel components in the two-byte format and in 
which the next two bytes are the least significant bytes 
of the pixel components in the two-byte format Since 
the least significant bytes of the pixel components in the 
two-byte format are equivalent to the two pixel oonpo- 
nents represented in the one-byte format, the two pixel 
components are effectively converted to a two-t>yte 
word in which each pixel conponent is represented in 
the one-byte fomnat. A merge computer instruction is 
capable of interleaving respective bytes of two four-byte 
words and is used once to group most significant bytes 
and least significant bytes of first and second pixel com- 
ponents represented in a two-byte format and to group 
most significant bytes and least significant bytes of third 
and fourth pixel components represented in the two- 
byte format and a second time to group the most signif- 
icant bytes of the first, second, third, and fourth pixel 
components and to group the least significant bytes of 
the first, second, third, and fourth pixel conponents. 
The least significant bytes of the first, second, third, and 
fourth pixel conponents represent the first second, 
third, and fourth pixel components in a one-byte format 
and are stored as the respective pixel components in 
the one-byte format Thus, four pixel components are 
converted from a two-byte format to a one-byte format 



using only two computer instructions. Eight contiguous 
bytes can be accessed in a single read computer 
instruction or a single write computer instruction. 
Accordingly, two read conputer instructions retrieve 
eight pixel components in a two-byte format. The eight 
pixel components are converted to a one-byte format 
using four merge computer instructions and are stored 
in menx>ry using a single write computer instruction. 
Accordngly, a four-band graphical image which 
includes one million pixels can be converted from a two- 
byte processing format to a one-byte display format 
using one miirion read conputer instructions, one-hatf 
million merge computer instructions, and one-half mil- 
lion write computer instructions. 
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Description 

nELD OF THE INVENTION 

The present invention relates to graphical image s 
processing in a computer system and. in particular, to a 
particularly efficient mechanism for recasting pixels of a 
graphical image Mrhich are represented in a 16-bit for- 
mat into pixels represented in an 8-brt format 

10 

BACKGROUND OF THE INVENTION 

In many computer graphics system in use today, 
individual picture elements. i.e.. pixels, of a graphical 
image are stored in a particular fomiat. For example, is 
single-band greyscale pixels are commonly stored as 
unsigned eight-bit integers, and 4-band color pixels are 
commonly stored as four contiguous unsigned eight-brt 
integers. Graphical images, which are generated using 
data representing a model and a computer process 20 
such as a three-dimensional nxxieling system,, fre- 
quently involve complex numerical calculations. It is 
common for a graphical image to be rendered while pix- 
els of the graphical Image are represented in a fonnat 
which provides greater precision that the particular for- 25 
mat in which displayed pixels are stored. For example, 
in a computer graphics system in which each band of a 
displayed pixel is stored as an eight-bit unsigned inte- 
ger, each band of a pixel is frequently stored as a six- 
teen-bit unsigned integer during processing and is so 
converted to an eight-bit irislgned integer substantially 
immediate prior to display of the pixel. Such format 
conversion of each band of a pixel is generally refened 
to as recasting the pixel. 

Recasting conventionally requires (i} loading from 3S 
the memory of a computer a single pixeH or a single 
band of a pixel at a time, (ii) converting the pixel or the 
band of the pixel to a display formal, and (ill) storing the 
converted pixel of band of a pixel. Graphical images 
commonly have approximately one thousand rows and 40 
approximately one thousand columns of pixels, i.e., 
approximately one million pixels, and cdor graphical 
images typically include four bands per pixel. Therefore, 
recasting by such conventional techniques typically 
involves approximately four nvHion load operations and 45 
approximately four million store operations. In addition, 
the recasting of a pixel or a band of a pixel typically 
requires at least one computer instruction per pixel or 
per band of each pixel. Therefore, approximately 
another four million computer instructions are required so 
to recasting each band of each pixel of a typical graphi- 
cal image. 

Processing of graphical images typically requires 
substantial processing resources. Requiring substantial 
processing resources to recast the pixels of a graphical ss 
image into a display format only adds to the processing 
resources required to render and display a graphical 
image. Because of the substantial oonputer system 



resources required for such graphical image recasting, 
a need persists in the industry for ever increasing effi- 
ciency in recasting of pixels or bands of pixels of graph- 
ical images from a high-precision processing format to a 
space-efficient display fonmat 

SUMMARY OF THE INVENTION 

In accordance vifith the present invention, data com- 
ponents in a first, processing format each of which 
includes a selected portion which represents the data 
component in a second, display format are merged to 
form an interleaved data word In vk^ich ttie selected por- 
tions of data components are grouped. For example, 
two pixel components, which are represented in a two- 
byte format in which the least significant byte represents 
each pixel component in a one-byte format are merged 
to form a four-byte interleaved word in virhich the first two 
bytes are the most significant tsytes of the pixel compo- 
nents in the two-byte format and in which the next two 
bytes are the least significant b/tes of the pixel compo- 
nents in ttie two-byte format Since the least significant 
bytes of the pixel components in the two-tTyte format are 
equivalent to tiie two pixel components represented in 
tiie one-byte format, the two pixel components are 
effectively converted to a tvh/o-byte word in which each 
pixel component is represented in the one-byte format. 

Further in accordance with tiie present invention, a 
merge computer instruction is capable of interleaving 
respective bytes of two four-byte words and is used 
once to group nrwst significant bytes and least signifi- 
cant bytes of first and second pixel components repre- 
sented in a two-byte format and to group most 
significant bytes and least significant bytes of third and 
fourth pixel components represented in the two-byte for- 
mat and a second time to group the most significant 
bytes of ttie first, second, third, and fourth pixel compo- 
nents and to group the least significant bytes of the first, 
second, third, and fourth pixel components. The least 
significant bytes of ttie first, second, third, and fourtii 
pixel components represent the first, second, third, and 
fburtii pixel components in a one-byte format and are 
stored as the respective pixel components in ttie one- 
byte format. Thus, four pixel components are converted 
from a two-byte format to a one-byte format using only 
two computer instructions. 

Further in accordance witii ttie present invention, 
eight contiguoi^ bytes can be accessed in a single read 
computer instruction or a single write conputer instruc- 
tion. Accordingly, two read oompiAer instructions 
retrieve eight pixel components each of which are repre- 
sented in a two-byte format. The eight pixel components 
are converted to a one-byte format using four merge 
computer instructions and are stored in memory i^ing a 
single eight-byte write computer instruction. Accord- 
ingly, a four-band graphical image which includes one 
maiion pixels can be converted from a two-byte process- 
ing format to a one-byte dsplay format using one million 
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read computer instructions, one>half million merge com- 
puter instructions, and one-haH million write computer 
instructions. Each write computer instmction can 
require an additional move computer instruction to form 
eight contiguous bytes of pixel data in an appropriate 5 
form for storage. The present invention therefore repre- 
sents a significant improvement of conventional tech- 
niques. 

BRIEF DESCRIPTION OF THE DRAWINGS 70 

Figure 1 is a block diagram of a computer system 
which includes an image processor which recasts 
graphical image data in accordance with the present 
invention. 15 

Figure 2 is a logic flow diagram illustrating the 
recasting of graphical image data by the image proces- 
sor of Figure 1 in accordance with the present invention. 

Figure 3 is a block diagram illustrating merge oper- 
ations used by the image processor of Figure 1 to recast 20 
graphical image data in accordance with the present 
invention. 

Figure 4 is a block diagram illustrating a merge 
operation performed by a computer processa of Figure 
1. 25 

Figure 5 is a block diagram of the computer proces- 
sor of Rgure 1 in greater detail. 

DETAILED DEgCWPnON 

30 

In accordance with the present invention, data com- 
ponents in a first, processing format, each of whrch 
includes a selected portion which represents the data 
component in a second, display fonnat, are merged to 
form an interleaved data word in which the selected por- ss 
tions of data components are grouped. For example, 
two pixel components, which are represented in a two- 
byte format in which the least significant byte represents 
each pixel component in a one-byte format, are merged 
to form a four-byte interleaved word in which the first two 40 
bytes are the most significant bytes of the pixel compo- 
nents In the two-byte format and in which the next two 
bytes are the least significant bytes of the pixel compo- 
nents in the two-byte format. Since the least stgnifrcant 
bytes of the pixel components in the two-byte format are 45 
equivalent to the two pixel components represented in 
the one-byte format, the two pixel components are 
effectively converted to a two-byte word in which each 
pixel component is represented in the one-byte format. 

50 

Hardware Components of the Image ProcessInQ Sys- 
tem 

To fiacilitate appreciation of the present invention, 
the hardware components of the recasting system are ss 
briefly described. Computer system 100 (Figure 1) 
includes a processor 102 and memory 104 which is 
coupled to processor 102 through a bus 106. Processor 



102 fetches from memay 104 computer instructions 
and executes the fetched computer instructions. Proc- 
essor 1 02 also reads data from and writes data to mem- 
ory 1 04 and sends data and control signals through bus 
106 to one or more computer display devices 120 in 
accordance with fetched and executed computer 
instructions. Processor 102 is described In greater 
detail below. 

Memory 104 can include any type of computer 
memory and can include, without Gmitatkin. randomly 
accessible memory (RAM), read-only memory (ROM), 
and storage devices which include storage media such 
as magnetic and/or optical disks. Memory 104 includes 
an image processor 110. which is a conputer process 
executing within processor 102 from memory 104. A 
computer process is a collection of computer instruc- 
tions and data which collectivety deline a task per- 
fomied by computer system 100. As descrtoed more 
completely below, image processor 110 (i) reads pixels 
in a processing format from processing buffer 112. (ii) 
recasts the pixels in the processing format to pixels in a 
display format, and (iiO stores the pixels in the display 
format in display buffer 114. 

Processing buffer 112 and display buffer 114 are 
stored in memory 104. Processing buffers 112 store 
data representing pixels of a graphical image in a 
processing format. In one embodiment, the processing 
fonnat includes a sixteen-bit unsigned integer to repre- 
sent each band of each pixel. For example, if the graph- 
ical image represented by processing buffer 112 is a 
single-band greyscale graphical image, each pixel of 
the graphk^ image is represented by a single sixteen- 
bit unsigned integer. Similarly, if the graphical image 
represented by processing buffer 112 is a four-band 
color graphical image whose bands are alpha, blue, 
green, and red, each pixel of the graphical image is rep- 
resented by a four contiguous sixteen-bit unsigned inte- 
gers which represent alpha, blue, green, and red 
components of the pixel. 

Display buffer 114 can be any graphical image 
buffer used in graphical image processing. For example, 
display buffer 114 can be a Z buffer which Is used In a 
conventional manner to remove hidden surfaces from a 
rendered graphical image. Alternatively, display buffer 
1 14 can be a frame buffer whose contents are immedi- 
ately displayed in one of computer display devices 1 20. 
Each of computer display devices 120 can be any type 
of computer display device including without limitation a 
printer, a cathode ray tube (CRT), a light-emitting diode 
(LED) display, or a liqud crystal display (LCD). Each of 
computer display devices 120 receives from processor 
102 control signals and data and, in response to such 
control signals, displays the received data. Computer 
display devices 120, and the control thereof by proces- 
sor 102. are conventional. 

The display fomiat is a format of the data which is 
suitable for receipt and display of the data by one or 
more of computer display devices 120. In one embodi- 



3 



5 



EP0817003A2 



6 



ment, the display format indudes an eight-bit unsigned 
integer to represent each band of each pixel. For exam- 
ple, if the graphical image represented by display buffer 
114 is a single-band greyscale graphical image, each 
pixel of the graphical image is represented by a single 5 
eight-bit unsigned integer. Similarly, if the graphical 
image represented by display buffer 11 4 is a four-band 
color graphical image whose bands are alpha, blue, 
green, and red. each pixel of the graphical image is rep- 
resented by a four contiguous eight-bit unsigned inte- w 
gers which represent alpha, blue, green, and red 
components of the pixel. 

The recasting of pixels from the processing format 
in processing buffer 1 1 2 to the display format in display 
buffer 1 1 4 by image processor 110 is illustrated as lo^c is 
flow diagram 200 (Figure 2). Processing according to 
logic flow diagram 200 begins with loop step 202. Loop 
step 202 and next step 216 define a loop in which image 
processor 110 (Figure 1} processes each band of each 
pixel of processing buffer 1 12 according to steps 204- 20 
214. Bght pixel components represented in processing 
buffer 1 1 2 are processed in a single iteration of the loop 
defined by loop step 202 and next step 216. For exam- 
ple, rf the graphical image represented In processing 
buffer 112 is a single-band greyscale graphical image, 25 
eight pixels are processed in a single iteration of the 
loop defined by loop step 202 and next step 216. On the 
other hand, K the graphical image represented in 
processing buffer 112 is a four-band color graphical 
image, eight pixel components which collective repre- 30 
sent two pixels are processed in a single iteration of the 
loop defined by loop st^ 202 and next step 216. Eight 
components are processed in each iteration of the loop 
defined by steps 202 and 216 in this illustrative embod- 
iment because the largest single write operation which 35 
can be peiformed by processor 102 (Figure 1) can write 
eight components in the display format to display buffer 
1 14 at once. F6r each eight of the components of the 
pixels of processing buffer 112. processing transfers 
from loop step 202 to step 204. 4o 

In step 204. image processor 110 (Rgure 1} reads 
eight pixel components in the processing format from 
processing buffer 112. Processor 102 performs a read 
operation in wNch sixteen contiguous bytes of data can 
be read from memory 104, Image processor 110 45 
invokes the read operation and causes processor 1 02 to 
perfomi a data alignment operation which shifts the 
read data such that the byte representing the first of the 
eight pixel components of processing buffer 1 12 to be 
processed according to the cuaent Heration of the loop so 
defined by loop step 202 (Rgure 2) and next step 21 6 is 
aligned on an eight-byte boundary. The first eight bytes 
of the afigned data represent four pixel components in 
the processing format, e.g.. four pixel components rep- 
resented by Bixteen-bit unsigned integers. The second ss 
four pixel components processed in the cun-ent iteration 
of the loop defined by steps 202 and 216 are read from 
processing buffer 1 12 In a second read operation and a 



second, corresponding data alignment operation. 

In a prefen-ed embodiment, image processor 110 
(Figure 1) determines whether the first sixteen bytes of 
data read in step 204 (Figure 2) are already aligned on 
an eight-byte boundary prior perfonning the data align- 
ment operation. If the sixteen bytes of data are already 
so aligned, image processor 110 (Figure 1) does not 
perform the data alignment operation and the data read 
in a singHe read operation represents all eight pixel oom- 
ponents. 

While data representing eight pixel components in 
the processing format are retrieved substantially simul- 
taneously, data representing four pixel components are 
converted from the processing fomiat to the display for- 
mat substantially simultaneously. Thus, eight contigu- 
ous bytes representing the first four pixel components 
read fi^om processing buffer 1 12 are stored in data dou- 
ble word 302 (Rgure 3) of image processor 1 1 0 (Figure 
1). Data double word 302 (Rgure 3) includes eight par- 
titioned bytes HO, LO, HI. LI. H2. L2. H3. and L3. Bytes 
HO and LO represent most significant and least signifi- 
cant bytes of the first pixel component. Similarly, bytes 
H1 and LI represent most significant and least signifi- 
cant bytes of the second pixel component; bytes H2 and 
L2 represent most significant and least significant bytes 
of the third pixel component: and bytes H3 and L3 rep- 
resent most significant and least significant bytes of the 
fourth pixel component In data double word 302. each 
of the four pixel components are processed such that 
the least significant byte of each pixel component in 
processing format is equivalent to the same pixel com- 
ponent in display fonmat. In one embodiment, process- 
ing of pixel components while stored in processing 
buffer 112 (Rgure 1) scales the pixel components such 
that the least significant portion of each pixel compo- 
nent represents the component in the display for- 
mat Since processing of pixel components typically 
involves scaling pixel components, the scale factor can 
be adjusted such that the result of such processing is a 
pixel component who least significant portion accurately 
represents the pixel component in the display fomnat. In 
this illustrative embodiment pixel components are proc- 
essed in the processing format of sixteen-bit unsigned 
integers t)ut are scaled during processing to have a 
value in the range of zero to 255 which is represented 
by the least significant eight bits of the pixel component 
As a result, the most significant portion of the pixel com- 
ponent in processing format, e.g., the eight most signif- 
icant k)its in this illustrative embodiment, are zero. 

In an attemative embodiment partitioned arithmetic 
operations are performed by processor 102 (Rgure 1) 
on data double word 302 (Figure 3) to scale each of the 
four pixel components ref^esented in data double word 
302 substantially simultaneously such that the least sig- 
ntficant portion of each of the pixel components repre- 
sents the pixel component in the display fomnat Sxh 
partitioned operations are described more complete, for 
example, in (i) United States patent application serial 



4 



7 



EP0817003A2 



8 



number 08^6,572 by Timothy J. Van Hook, Leslie 
Dean Kbhn, and Robert YUng. filed April 29, 1994 and 
entiHed *A Central Processing Unit with Integrated 
Graphics Functions" (the '572 application) and GO 
United States patent application serial number 
08/398,111 by Chang-Guo Zhou and Daniel S. Rice, 
filed March 3, 1995 and entitled "Color Formal Conver- 
sion in a Parallel Processor" (the '1 1 1 application), both 
of which are incorporated in their entirety herein by ref- 
erence. 

In step 204 (Figure 2), image processa 110 {Figure 
1) stores the second four pixel components in a data 
double word 312 (Figure 3) in a directly analogous man- 
ner to that described above with respect to data double 
word 302. Processing transfers from step 204 (Figure 2) 
to step 206. 

In step 206, image processor 110 (Figure 1) 
merges bytes HO (Figure 3), LO. HI. and LI with bytes 
H2. L2. H3, and L3 using a PMERGE operation 306 
which is performed by processor 102 (Figure 1) and is 
illustrated in Rgure 4. Data word 402 is 32-bit& in length 
and includes four partitioned bytes 402A-D. Similarly, 
data word 404 is 32-bits in length and includes four par- 
titioned bytes 404A-D. The PMERGE operation inter- 
leaves respective bytes of data words 402 and 404 into 
a double data word 406 as shown. Double data word 
406 is 64 bits in length and includes eight partitioned 
bytes 406A-H. The result of PMERGE operation 304 
(Rgure 3} is data double word 306 which is 64-bits in 
length and whose eight partitioned bytes have the fol- 
lowing values: HO. H2, LO, L2, HI. H3. LI. and L3. 
Processing transfers from step 206 (Figure 2) to step 
208. 

In step 208. image processor 110 (Rgure 1) 
merges upper tour bytes 306H (Rgure 3) of data double 
word 306 and lower four bytes 306L of data double word 
306 using a PMERGE operation 308, which is directly 
analogous to PMERGE operation 304 desaibed above. 
The result of PMERGE operation 308 is double data 
word 310 which is 64-bits in length and whose eight par- 
titioned bytes have the following values: HO, HI . H2, H3. 
LO. L1. L2. and L3. Processing transfers from step 208 
(Figure 2) to step 210. 

In step 210, image processor 110 (Rgure 1) 
merges the second four pixel components stored in data 
double word 312 using a PMERGE operation 314 in a 
directly analogous manner to that deserted above with 
r^pect to step 206 (Rgure 2) to produce data double 
word 316 whose eight partitioned bytes are H4 (Figure 
3). H6. L4. L6. H5. H7. L5. and L7. Processing transfers 
to step 212 (Rgure 2) in which image processor 110 
(Figure 1) merges upper four bytes 316H (Figure 3) and 
lower four bytes 316L of data double word 316 repre- 
senting the second four pixel components in a directly 
analogous manner to that described above with respect 
to step 208 (Figure 2). The result of PMERGE operation 
318 (Rgure 3) is double data word 320 which is 64-bits 
in length and whose eight partitioned bytes have the fol- 



lowing values: H5. H6. H6, H7. L4. L5, L6. and L7. 

As described above, the least signiftcant byte of 
each of the pixel components in the processing format 
accurately represents the pixel component in the dis- 

5 play format Since bytes LO, LI , L2, and L3 are the least 
significant bytes of the first four pixel components 
retrieved in step 204 (Rgure 2). bytes LO (Rgure 3). LI. 
L2. and L3 accurately represent the first four pixel com- 
ponents in the display format. Similarly, bytes L4. L5, 

w L6. and L7 are the least significant bytes of the second 
lour pixel components retrieved in step 204 (Figure 2) 
and therefore accurately represent the second four pixel 
components in the display format. In step 214. image 
processor 110 (Rgure 1) writes to display buffer 114 

IS lower four bytes 31 OL (Figure 3) of data double word 
310 and lower four bytes 320L of data doii^le word 320, 
which collectively form data double word 322 whose 
eight partitioned bytes have the values LO, Li, L2, L3, 
L4, L5. L6, and L7. In one emtxxjiment. image proces- 

20 sor 1 1 0 (Figure 1 ) combines lower four bytes 31 OL (Rg- 
ure 3) of data double word 310 and lower four bytes 
320L of data double word 320 to form data double word 
322 pria to writing data double word 322 in a single 
computer prior to writing data double word 322 to dis- 

25 play buffer 114 (Figure 1). 

Thus, eight pixel components are converted from a 
processing format to a display format using only two 
read operations and a single write operation. In addi- 
tion, four pixel components are converted from the 

so processing format to the display fonnat in only two 
PMERGE operations. Accordingly, converting one mil- 
lion four-band color pixels in processing format in 
processing buffer 1 1 2 to display format in display buffer 
114 using only one million read operations, SOO.OOO 

35 write operations, and 500.000 PMERGE operations. By 
contrast, conventional conversion techniques typtcaily 
require four million read operations, four million write 
operations, and at least four million operations to con- 
vert each pixel component. Therefore, the present 

40 invention represents a significant improvement over 
conventional graphical image fomnat conversion tech- 
niques. 

As described above, storage of pixels in display 
buffer 1 14 can result immediately or incfirectly in display 

45 Of such pixels in one or more of computer display 
devices 120. From step 214 (Rgure 2), processing 
transfers through next step 216 to loop step 202 in 
which the next eight pixel components stored in 
processing buffer 1 12 are processed accoiding to steps 

so 204-214. Once all pixel components stored In process- 
ing buffer 112 have been processed according to the 
loop of loop step 202 and next step 216, processing 
according to logic flow diagram 200 completes. 

While it is generally described that all pixel compo- 

ss nerits stored in processing buffer 112 (Rgure 1) are 
processed, eight pixels per iteration of the loop of loop 
step 202 (Rgure 2} and next step 216. some buffers do 
not necessarily store pixels of sequential scanlines con- 
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tiguousty. Therefore, in a preferred embodimertt, image 
processor 1 10 (Figure 1} processes in each iteration of 
the loop of loop step 202 (Figure 2) and next step 216 
eight pixel oomponents of a particular scanline stored 
within processing buffer 1 1 2 (Figure 1 ). In this preferred $ 
emt>odinient, image processor 110 processes each 
scanline of processing buffer 1 12 In sequence. 

It is appreciated that scanlines of a particular 
graphical image represented by processing buffer 112 
sometimes has a number of pixel components which is io 
not evenly div^ible by eight. In such circumstances, 
image processor 110 processes one, two, three, four, 
five, six, or se^ pixel components stored within 
processing buffer 112 in the manner described above 
with respect to steps 204-214 (Rgure 2) while Ignoring is 
excess bytes of data double words 302 (Figure 3), 306, 
310, 312. 316, 320. and 322. For example, if scanlines 
of a graphical image represented within processing 
buffer 1 12 include a number of pixel components which 
is one more than an integer rruiKiple of eight, one pixel 20 
component stored within processing buffer 1 12 is proc- 
essed in the following manner. 

Image processor 110 reads one pixel component 
from processing buffer 112 and stores the read pixel 
component as bytes HO and LO in data double word 302 25 
(Figure 3). Bytes H1 , L1 . H2, 12. H3, L3, H4. L4. H5, L5, 
H6, L6. H7, and L7 are ignored. PMERGE operations 
304 and 308 are executed in the manner described 
above. As a result, tyyte LO is the most significant byte of 
data double word 322 and is stored in displs^ buffer 114 30 
(Figure 1) by image processor 110. Bytes L2-7 (Figure 
3) are data double word 322 are Ignored. 

Processor 102 

35 

Processor 1 02 is shown in greater detail in Figure 5 
and is desaibed briefly herein and more completely in 
the '572 application. Processor 102 includes a prefetch 
and dispatch unit (PDU) 46, an instruction cache 40, an 
integer execution unit (lEU) 30, an integer register file 
36. a floating point unit (FPU) 26. a floating point regis- 
ter file 38. and a graphics execution unit (GRU) 28, cou- 
pled to each other as shown. Additionally, processor 
102 includes two memory management units (IMMU & 
DMMU) 44a-44b, and a load and store unit (LSU) 48, 45 
which in turn includes data cache 120. coupled to each 
other and the pr^ously described elements as shown. 
Together, the components of processor 102 fetch, dis- 
patch, execute, and save execution results of computer 
instructions, e.g., computer instructions of image proc- so 
essor 1 10 (Rgure 1), in a pipelined manner. 

PDU 46 (Rgure 5) fetches instructions from mem- 
ory 104 (Rgure 1) and dispatches the instaictions to 
lEU 30 (Figure 5). FPU 26. GRU 28, and LSU 48 
accordingly. Prefetched instructions are stored in 55 
instruction cache 40. lEU 30. FPU 26. and GRU 28 per- 
form integer, floating point, and graphics operations, 
respectively. In general, the integer operands and 



results are stored in integer register file 36. whereas the 
floating point and graphics operands and results are 
stored in floating point register file 38. Additionally. lEU 
30 also performs a number of graphics operations, and 
appends address space identifiers (AS!) to addresses 
of load/store instructions for LSU 48. identifying the 
address spaces being accessed. LSU 48 generates 
addresses for all load arKi store operations. The LSU 48 
also supports a number of load and store operations, 
specifically designed for graphics data. Memory refer- 
ences are rm6e in virtual addresses. MMUs 44a-44b 
map virtual addresses to physical addresses. 

PDU 46. lEU 30. FPU 26, integer and floating point 
register files 36 and 38. I^MUs 44a-44b. and LSU 48 
can be coupled to one another in any of a number of 
configurations as described more completely in the '572 
application. As described more completely in the '572 
appl'K^ation with respect to Rgures 8si-S6 thereof GRU 
28 performs a number of distinct partitioned multiplica- 
tion operations and partitioned addition operations. VSv- 
ious partitioned operations used by image processor 
1 10 (Rgure 1) are described more completely bdow. 

As described above, processor 102 includes four 
(4) separate processing units, i.e.. LSU 48, lEU 30, FPU 
26, and GRU 28. Each of these processing units is 
desaibed more conrtpletely In the '572 application. 
These processing units operate in parallel and can each 
execute a respective computer instruction while others 
of the processing units executes a different computer 
instruction. GRU 28 executes the PMERGE operations 
described abova 

In one embodiment, processor 102 is the 
UltraSPARC processor and computer system 100 (Rg- 
ure 1) is the UltraSPARCstation. both of which are avail- 
able from Sun Microsystems, Inc. of Mountain View. 
Cafifornia. Sun. Sun Microsystems, and the Sun Logo 
are trademarks or registered trademarks of Sun Micro- 
systems. Inc. in the United States and other countries. 
All SPARC trademarks are used under license and are 
trademarks of SPARC International. Inc. in the United 
States and other countries. Products bearing SPARC 
trademarks are based upon an architecture developed 
bf Sun Microsystems, Inc. 

Claims 

1. A mettxxi for converting a first data word whfch 
includes at least two data components in a f irst data 
format to a second data word which includes the at 
least two data components in a second data fbrmat. 
the method comprising: 

interieaving (i) a first portion of the first data 
word which includes a first one of the at least 
two data components and (ii) a second portion 
of the first data word whk:h includes a second 
one of the a[t le^ two data oomponents to form 
an interieaved data word which includes a 
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selected portion of the first data component 
which Is adjacent to a selected portion of the 
second data component, wherein the selected 
portions of the first and second data compo- 
nents represent the first and second data com- 5 
ponents in the second data format; and 
including the selected portions of the first and 
second data components from the interleaved 
data word in the second data word. 

10 

2. The method of Claim 1 wherein the step of Inter- 
leaving is performed by a computer processor in a 
single instruction cyde of the computer processor. 

3. The method of Claim 1 further comprising: 15 

reading the first data word from a buffer stored 
in a memory of the computer. 

4. The method of Claim 1 wherein the selected portion 20 
of the first data componerrt is a least significant por- 
tion of the first data word; and 

further wherein the selected portion of the sec- 
ond data component is a least significant por- 25 
tion of the first second data word. 

5. The method of Claim 1 further comprising: 

storing the second data word In a destination 30 
buffer in a memay of a computer. 

6. The method of Claim 1 wherein the first portion of 
the first data v\rord further includes a third one of the 

at least two data components; ss 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 
the at least two data components; 

(b) further wherein the interleaved data word 40 
further includes a selected portion oi the third 
data component and a selected portion of the 
fourth data component, the selected portions of 
the third and fourth data components being 
adjacent to one another within the interleaved 45 
word; 

(c) further wherein the selected portions of the 
third and fourth data components represent the 
third and fourth data components in the second 
datafbrmat; so 

(d) further wherein the method further com- 
prises: 

0) interleaving a first portion of the inter- 
leaved word with a second portion of the ss 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first, second, third, and fourth data 



components are substantially contiguous; 
and 

(e) further wherein the step of include further 
comprises: 

(1) including the selected portions of the 
third and fourth data components In the 
second data word. 

7. A computer program product which includes a com- 
puter usable medium having computat))e readat)le 
code embodied therein for converting a first data 
word which includes at least two data components 
In a first data format to a second data word which 
includes the at least two data components in a sec- 
ond data format, the computer readable code com- 
prising: 

a merge nxxlule configured to interleave (I) a 
first portion of the first data word which 
includes a first one of the at least two data com- 
ponents and (ii) a second portion of the first 
data word which includes a second one of the 
at least two data components to tomi an inter- 
leaved data v^ord which includes a selected 
portion of the first data conponent adjacent to 
a selected portion of the second data compo- 
nent wherein the selected portions of the first 
and second data components represent the 
first and second data components in the sec- 
ond data format; and 

a data selection module operatively coupled to 
the merge module and configured to Include 
the selected portions of the first and second 
data components from the interleaved data 
word in the second data word. 

8. The computer program product of Claim 7 wherein 
the merge module is further configured to inter- 
leave the first and second portions of the first data 
word in a single instruction cycle of a computer 
processor. 

9. The computer program product of Claim 7 wherein 
the computer readable code further comprises: 

a data component retrieving module opera- 
tively coi4)led to the merge module and config- 
ured to read the first data word from a buffer 
stored in a memory of a computer. 

10. The computer program product of Claim 7 vi^erein 
the selected portion of the first data component is a 
least significant portion of the first data word; and 

further wherein the selected portion of the sec- 
ond data conponent is a least significant por- 
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ton of the first seoond data word. 

11 . The computer program product of Claim 7 wherein 
the computer readafcde code further comprises: 

a data conponent storage module operatively 
coupled to the data selection module and con- 
figured to store the second data word in a des- 
tination buffer in a memory of a computer. 

12. The computer program product of Claim 7 wherein 
the first portion of the first data word further 
includes a third one of the at least two data conrpo- 
nents; 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 
the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third 
data component and a selected portion of the 
fourth data component, the selected portions of 
the tNrd and fourth data components being 
adjacent to one another within the interleaved 
word; 

(c) further wherein the selected portions of the 
third and fourth data components represent the 
third and fourth data components in the second 
data format 

(d) further wherein the computer readable code 
^rther conrprises: 

(1) a second merge module different from 
the first-mentioned merge nxKiule. opera- 
tively coupled to the first merge module 
and the data selection module, and config- 
ured to interleave a first portion of the inter- 
leaved word with a second portion of the 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first, seoond. third, and fourth data 
conponents are substantially contiguous; 
and 

(e) further wherein the data selection modkile is 
further configured to indude the selected por- 
tions of the third and fourth data components in 
the second data word. 

13. A data recaster for converting a first data word 
which includes at least two data components in a 
first data forrret to a second data word which 
includes the at least two data components in a sec- 
ond data format the data recaster comprising: 

a merge module configured to interleave (i) a 
first portion of the first data word which 
includes a first one of the at least two data com- 
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ponents and (ii) a seoond portion of the first 
data word which includes a second one of the 
at least two data conponents to fonn an inter- 
leaved data word which includes a selected 
portion of the first data component adjacent to 
a selected portion of the second data compo- 
nent, wherein the selected portions of the first 
and second data conponents represent the 
first and second data conponents in the sec- 
ond data format; and 

a data selection module operatively coupled to 
the merge module and configured to include 
the selected portions of the first and second 
data components from the interleaved data 
word in the second data word. 



14. The data recaster of Claim 13 wherein the merge 
module is further configured to irrterleave the first 
and second portions of the first data word in a sin- 

20 gle instruction cycle of a computer processor. 

15. The data recaster of Claim 13 further comprising: 

a data component retrieving module opera- 
25 tively coupled to the merge module and config- 

ured to read the first data word from a buffer 
staed in a memory of a computer. 

1 6. The data recaster of Claim 13 wherein the selected 
30 portion of the first data conponent is a least signifi- 
cant portion of the first data word; and 

further wherein the selected portion of the sec- 
ond data component is a least significant por- 
35 tion of the first second data word. 

17. The data recaster of Claim 13 further oonprising: 

a data corrponent storage module operatively 
40 coupled to the data selection module and con- 

figured to store the second data word in a des- 
tination buffer in a memory of a computer. 

18. The data recaster of Claim 13 wherein the first por- 
45 tion of the first data woid further includes a third 

one of the at least two data components: 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 

50 the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third 
data conponent and a selected portion of the 
fourth data component, the selected portions of 

fis the third and fourth data corrponents being 

adjacent to one another within the interleaved 
word. 

(c) further wherein the selected portions of the 
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IhincI and fourth data components represent the 
third and fourth data components in the second 
data format: 

(d) further wherein the data recaster further 
comprises: s 



20. The computer system of Claim 19 wherein the 
merge module is further configured to interleave 
the first and second portions of the f irst data word in 
a single instruction cycle of the computer proces- 
sor. 



(i) a second merge module different from 
the first-mentioned merge module, opera- 
tively coupled to the first merge module 
and the data selection module, and config- to 
ured to interleave a first portion of the inter- 
leaved word with a second portion of the 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first, second, thirxl. and fourth data is 
components are substantially contiguous; 
and 



21 . The computer system of Claim 1 9 wherein the data 
recaster further comprises: 

a data component retrieving module opera- 
tively coipled to the merge nxxiule and config- 
ured to read the first data word from a buffer 
stored in the memory. 

22. The computer system of Claim 19 wherein the 
selected portion of the first data corrponent is a 
least significant portion of the first data word; and 



(e) further wherein the data selection module is 
further configured to include the selected por- 20 
tons of the third and fourth data components In 
the second data word. 



19. A computer system comprising: 
a memory; 

a computer processor operatively coupled to 
the memory; and 

a data recaster stored in the memory and 
which includes at least one computer instruc- 30 
tions which are executed within the computer 
processor to convert a first data word which 
includes at least two data corrponents In a first 
data fonnat to a second data word which 
includes the at least two data components in a 3s 
second data format the data recaster compris- 
ing: 



further wherein the selected portion of the sec- 
ond data component is a least significant por- 
tion of the first second data word. 

23b The conputer system of Claim 19 wherein tfie data 
recaster further comprises: 

a data component storage module operatively 
coupled to the data selection module and con- 
figured to store the second data word in a des- 
tination buffer in the memory. 

24. The computer system of Claim 19 wherein the first 
portion of the first data word further includes a third 
one of the at least two data components; 

(a) further wherein the second portion of the 
first data wond further includes a fourth one of 
the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third 
data component and a selected portion of the 
fourth data comppnent the selected portbns of 
the third and fourth data components being 
adjacent to one another within the Interleaved 
word; 

(c) further wherein the selected portions of the 
third and fourth data components represent the 
third and fourth data components in the second 
data format; 

(d) further wherein the data recaster further 
comprises: 

(i) a secorKi merge module different from 
the firsl-mentioned merge module, opera- 
tively coupled to the first merge nxxjule 
and the data selection module, and config- 
ured to interleave a first portion of the inter- 
leaved viford with a second portion of the 
interleaved word to fomi a second inter- 



a merge module configured to interleave (i) 
a first portion of the first data word which 40 
includes a first one of the at least two data 
components and OD & second portion of 
the first data word which Includes a second 
one of the at least two data components to 
form an interleaved data word which 45 
includes a selected portion of the first data 
component adjacent to a selected portion 
of the second data component, wherein 
the selected portions of the first and sec- 
ond data components represent the first so 
and second data components in the sec- 
ond data format; and 

a data selection module operatively cou- 
pled to the merge module and configured 
to include the selected portions of the first ss 
and second data components from the 
interleaved data word in the second data 
word. 
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leaved word in which the selected portions 
of the first. secx)nd. third, and fourth data 
components are substantially contiguous; 
and 

5 

(e) further wherein the data selection module is 
further configured to include the selected por- 
tions of the third and fourth data conrponents in 
the second data word. 

10 

25. A system for distributing code (i) which Is stored on 
a computer-readable medium, (ii) which is executa* 
ble by a computer, and (iii) which includes at least 
one module, each of which in turn is configured to 
carry out at least one function to be executed by the is 
computer, the at least one function including con- 
verting a first data word which includes at least two 
data components in a first data format to a second 
data word which includes the at least two data com- 
ponents in a second data format, the system com- 20 
prising: 

a merge module configured to interleave (i) a 
first portion of the first data word which 
includes a first one of the at least two data com- 25 
ponents and (ii) a second portion of the first 
data word which includes a second one of the 
at least two data components to form an inter- 
leaved data word which includes a selected 
portion of the first data component acfacent to 30 
a selected portion of the second data corrpo- 
nent. wherein the selected portions of the first 
and second data components represent the 
first and second data components In the sec- 
ond data format; and 35 
a data selection module operatively coupled to 
tiie merge module and configured to include 
the selected portions of the first and second 
data components from tiie interleaved data 
word in the second data word. 40 

26. The system of Claim 25 wherein the merge module 
is further configured to interleave the first and sec- 
ond portions of the first data word in a single 
instruction cyde of a computer processor. 4S 

27. The system of Claim 25 further comprising: 

a data component retrieving module opera- 
tively coupled to tiie merge module and conf ig- so 
ured to read the first data word from a buffer 
stored in a memory of a computer. 

28. The system of Qaim 25 wherein the selected por- 
tion of the first data component is a least significant ss 
portion of the first data word; and 

further wherein the selected portion of the sec- 



ond data component is a least significant por- 
tion of the first second data word. 

29. The system of Claim 25 further compri^ng: 

a data component storage module operatively 
coupled to tiie data selection module and con- 
figured to store the second data word in a des- 
tination buffer in a memory of a computer. 

30. The system of Claim 25 wherein the first portion of 
the first data word further includes a third one of the 
at least two data components; 

(a) furtiier wherein the second portion of the 
first data word further includes a fourth one of 
the at least two data conponents; 

(b) further wherein tiie interleaved data word 
furtiier includes a selected portion of the third 
data corrponent and a selected portion of tiie 
fourth data component, ttie selected portions of 
tiie third and fourtii data conrponents being 
adjacent to one another within the interleaved 
word; 

(c) furtiier wherein the selected portions of the 
third and fourtii data components represent the 
third and fourth data components in the second 
data format; 

(d) further wherein the data recaster further 
comprises: 

(i) a second merge module different from 
the first-mentioned merge module, opera- 
tiveiy coupled to the first merge module 
and the data selection module, and config- 
ured to interleave a first portion of the inter- 
leaved word with a second portion of tiie 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first second, third, and fourth data 
components are suk)stantially contiguous; 
and 

(e) furtiier wherein the data selection module is 
further configured to include ttie selected por- 
tions of the third and fourth data components in 
the second data word. 
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